The complexity of unobservable nite - horizon Markov decision processes
نویسنده
چکیده
Markov Decision Processes (MDPs) model controlled stochastic systems. Like Markov chains, an MDP consists of states and probabilistic transitions; unlike Markov chains, there is assumed to be an outside controller who chooses an action (with its associated transition matrix) at each step of the process, according to some strategy or policy. In addition, each state and action pair has an associated reward. The goal of the controller is to maximize the expected reward. MDPs are used in applications as diverse as wildlife management and robot navigation control. Optimization and approximation strategies for these models constitute a major body of literature in mathematics, operations research, and engineering. We consider the complexity of the following decision problem: for a given MDP and type of policy, is there such a policy for that MDP with positive expected reward? The complexity of this problem depends on at least half a dozen factors, including the information available to the controller, the feedback mechanism, and the succinctness of the representation of the system relative to the number of states. This paper, together with 6], shows variations of the decision problem to be complete for NL, and 6] also shows that some NP-complete problems are not "-approximable. This paper focuses on the proofs of completeness for PL, PP, and NP PP. All of the problems considered here are for MDPs that run for a xed, nite time, either equal to the number of states of the system or the size of the representation of the system. Papadimitriou and Tsitsiklis showed that the most straightforward MDP decision problem is P-complete, and other variants are NP-or PSPACE-complete 13]. Several others have shown other MDP problems are complete for P, NP, PSPACE, or EXP 12, 4, 9, 10]. We consider a slightly diierent decision problem than they do; our decision problem allows self-reductions to nd the optimal policy, whereas theirs does not.
منابع مشابه
The Complexity of Policy Evaluation for Finite-Horizon Partially-Observable Markov Decision Processes
A partially-observable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. POMDPs are used to model controlled stochastic processes, from health care to manufacturing control processes (see 19] for more examples). We consider several avors of nite-horizon POMDPs. Our results concern the comple...
متن کاملThe complexity of policy evaluation for nite - horizonpartially - observable
A partially-observable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. POMDPs are used to model controlled stochastic processes, from health care to manufacturing control processes (see 19] for more examples). We consider several avors of nite-horizon POMDPs. Our results concern the comple...
متن کاملGood Policies for Partially-observable Markov Decision Processes Are Hard to Nd
Optimal policy computation in nite-horizon Markov decision processes is a classical problem in optimization with lots of pratical applications. For stationary policies and innnite horizon it is known to be solvable in polynomial time by linear programming, whereas for nite-horizon it is a longstanding open problem. We consider this problem for a slightly generalized model, namely partially-obse...
متن کاملRisk-sensitive and minimax control of discrete-time, finite-state Markov decision processes
This paper analyzes a connection between risk-sensitive and minimax criteria for discrete-time, nite-states Markov Decision Processes (MDPs). We synthesize optimal policies with respect to both criteria, both for nite horizon and discounted in nite horizon problem. A generalized decision-making framework is introduced, which includes as special cases a number of approaches that have been consid...
متن کاملRisk - Sensitive , Minimax , and Mixed Risk - Neutral / Minimax Control of Markov Decision Processes
This paper analyzes a connection between risk-sensitive and minimax criteria for discrete-time, nite-state Markov Decision Processes (MDPs). We synthesize optimal policies with respect to both criteria, both for nite horizon and discounted in nite horizon problems. A generalized decision-making framework is introduced, leading to stationary risk-sensitive and minimax optimal policies on the in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996